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0© (57) Abstract: Methods and systems for time-frequency domain watermarking of media signals, such as audio and video signals. 

An encoding method divides the media signal into segments, transforms each segment into a time-frequency representation, and corn- 
el putes a time- frequency domain watermark signal based on the time frequency representation. It then combines the time-frequency 
domain watermark signal with the media signal to produce a watermarked media signal. To embed a message using this method, one 
^? may use peak modulation, pseudorandom noise modulation, statistical feature modulation, etc. Watermarking in the lime-frequency 
domain enables the encoder to perceptually model time and frequency attributes of the media signal simultaneously. A watermark 
decoder uses a calibration signal to detect the watermark signal in a potentially distorted version of the watermarked signal. The 
calibration signal may also be used to determine the watermark's alignment and scaling. After compensating for the alignment and 
scaling, a watermark reader extracts an embedded message from a time frequency representation of the media signal. 
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Watermarking in the Time-Frequency Domain 

Field of the Invention: 

The invention relates to digital watermarks and more particularly to watermarking media 
signals using time-frequency representations of those signals to compute the watermark. 

5 

Background and Summary: 

Digital watermarking is a process for modifying physical or electronic media to embed a 
machine-readable code into the media. The media may be modified such that the embedded 
code is imperceptible or nearly imperceptible to the user, yet may be detected through an 
1 0 automated detection process. Most commonly, digital watermarking is applied to media signals 
such as images, audio signals, and video signals. However, it may also be applied to other 
types of media objects, including documents (e.g., through line, word or character shifting), 
software, multi-dimensional graphics models, and surface textures of objects. 

1 5 Digital watermarking systems typically have two primary components: an encoder that embeds 
the watermark in a host media signal, and a decoder that detects and reads the embedded 
watermark from a signal suspected of containing a watermark (a suspect signal). The encoder 
embeds a watermark by altering the host media signal. The reading component analyzes a 
suspect signal to detect whether a watermark is present. In applications where the watermark 

20 encodes information, the reader extracts this information from the detected watermark. 

Several particular watermarking techniques have been developed. The reader is presumed to be 
familiar with the literature in this field. Particular techniques for embedding and detecting 
imperceptible watermarks in media signals are detailed in the assignee's.co-pending application 
25 serial number 09/503,881 and US Patent Nos. 5,862,260 and 6,122,403, which are hereby 
incorporated by reference. 

The invention provides methods and systems for time-frequency domain watermarking of 
media signals, such as audio and video signals. One aspect of the invention is a method of 

30 watermarking a media signal with a temporal component. The method divides the media signal 
into segments, transforms each segment into a time-frequency spectrogram, and computes a 
time-frequency domain watermark signal based on the time frequency spectrogram. It then 
combines the time-frequency domain watermark signal with the media signal to produce a 
watermarked media signal. To embed a message using this method, one may use peak 

35 modulation, pseudorandom noise modulation, statistical feature modulation, etc. Watermarking 
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in the time-frequency domain enables the encoder to perceptually model time and frequency 
attributes of the media signal simultaneously. 

Another watermark encoding method divides at least a portion of the media signal into 
5 segments and processes each segment as follows. It moves a window along the media signal in 
the segment and repeatedly applies a frequency transform to the media signal in each window 
to generate a time-frequency representation. It computes a perceptually adaptive watermark in 
the time-frequency domain, converts the watermark signal to the time domain using an inverse 
frequency transform and repeats the process until each segment has been processed. Finally, it 
10 adds the watermark signal to the media signal to generate a watermarked media signal. 

Another aspect of the invention is a method of decoding a watermark from a media signal. The 
method transforms the media signal to a time frequency representation, computes elements of a 
message signal embedded into the media signal from the time frequency representation, and 
15 decodes a message from the elements. The elements may be message signal elements of an 
antipodal, pseudorandom noise based watermark, or message signal elements of some other 
type of watermark signal, such as statistical feature modulation signal, peak modulation signal, 
echo modulation signal, etc. 

20 Another aspect of the invention is a watermark decoder. The decoder includes a detector for 
determining whether a watermark is present in the media signal and determining an alignment 
and scale of the watermark. It also includes a reader for decoding an auxiliary message 
embedded in a time frequency representation of the media signal. 



25 
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Further features and advantages of the invention will become apparent from the following 
detailed description and accompanying drawings. 

Brief Description of the Figures: 

Figure 1 illustrates an audio signal in the time domain, i.e. magnitude versus time. 

Figure 2 illustrates an audio signal in the frequency domain, i.e. magnitude versus frequency. 

Figure 3A illustrates an audio signal in the time-frequency domain, also known as a 
spectrogram of an audio signal, i.e. magnitude versus frequency versus time. 
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Figure 3B illustrates a perceptual modeling function that operates on a time-frequency 
representation of a media signal. 

Figure 4A is a generalized flow diagram of a process for computing a watermark in a time- 
5 frequency domain of a media signal and embedding the watermark in the media signal. 

Figure 4B is another flow diagram of a process for computing a watermark in a time-frequency 
domain of a media signal and embedding the watermark in the media signal. 

10 Figure 4C is a flow diagram illustrating features of Figure 4B and Figure 5A. 

Figure 4D is a generalized flow diagram of decoding a time-frequency watermark in an audio 
signal. 

15 Figure 5A is a more detailed diagram of watermarking an audio signal in the time-frequency 
domain. 

Figure 5B is a more detailed diagram of decoding a watermark from an audio signal in the 
time-frequency domain. 



20 



Figure 6 is a diagram of a system for implementing the time-frequency based watermarking. 



Detailed Description: 

25 To illustrate watermarking technology described in this document, it is helpful to start by 
illustrating examples of time, frequency, and time-frequency domain representations of a 
media signal. For the sake of illustration, the following discussion illustrates representations of 
audio signal in the time, frequency, and time-frequency domains. Other time varying media 
gnals, like video, can also be represented in the time, frequency and time frequency domains. 



an 
si 
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An audio signal can be represented in the time domain, i.e. by a magnitude (e.g., sound 
pressure level) versus time curve, as shown in Fig. 1. A segment of an audio signal (such as the 
portion of the signal designated by the letter A in Figure 1) can also be represented in a 
frequency domain (e.g., Fourier transform domain), as a plot of magnitude versus frequency as 
35 illustrated in Fig. 2. 
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A digital watermark can be embedded in the audio signal by modifying the signal in the 
frequency domain. The dotted line in Figure 2 represents a digital watermark signal. This 
watermark signal can be embedded in the original signal to create a watermarked audio signal. 
So long as the watermark signal is about 23 db below the original signal, it will generally not be 
5 noticed by listeners (or viewers of image signals) . 

A time varying media signal, such as an audio or video signal, can also be represented in a 
time-frequency domain. In a time frequency representation, the signal is represented as 
magnitude and/or phase versus frequency versus time, as shown in Fig. 3 A. In Fig. 3A, the 
10 lighter grayscale colors represent higher magnitudes while darker colors represent lower 

magnitudes in the time frequency representation. Some signal transformations, such as certain 
types of filter banks (e.g., Quadrature Mirror filters) or wavelets inherently produce time- 
frequency data. 

15 A Fourier analysis, such as an FFT, may be used to create a time-frequency representation by 
taking the FFT of several windowed time segments of the audio signal. The segments can be 
temporally or spatially overlapping or non-overlapping, as long as the inverse transform takes 
into account the extent of the overlap, if any, to properly reconstruct the signal in the domain in 
which it is perceived. This re-construction process is known as overlap-and-add. The 

20 segments can also be windowed, using a Hamming or Hanning window for example, to reduce 
the frequency representation of the window versus the signal. In audio, time-frequency 
representations are sometimes referred to as spectrograms. 

The following sections describe various watermark encoding and decoding methods that 
25 operate on time frequency representations of media signals. These techniques are applicable to 
media signals that have a temporal component, such as audio and video. 

The watermark encoding methods take advantage of perceptual masking of the host media 
signal to hide the watermark. Time-frequency representations provide an opportunity to 
30 perform perceptual modeling based on temporal and frequency domain masking characteristics 
of the signal. In fact, since these representations provide both temporal and frequency 
information, the encoding system may perform temporal and frequency perceptual modeling 
simultaneously on the time-frequency representation of the media signal. 

35 For audio signals, perceptual masking refers to a process where one sound is rendered inaudible 
in the presence of another sound. There are two primary categories of audio masking: 
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simultaneous and non-simultaneous (temporal). While more complex forms of masking may 
exist, simultaneous masking can be classified into three groups: noise-masking tone in which a 
narrow band noise masks a tone within the same critical band; tone masking noise in which a 
pure tone occurring at the center of a critical band masks noise of any sub-critical bandwidth or 
5 shape, provided the noise spectrum is below a predictable threshold of the masking tone; and 
noise masking noise, in which a narrow band noise masks another narrow band noise. 

Simultaneous masking is not limited to within a single critical band; rather, a masker sound 
within one critical band has a masking affect in other critical bands known as the spread of 
10 masking. The effect of a tone masking noise can be modeled by a triangular spreading function 
that has slopes of, for example, 25 and -10 dB per Bark. This enables the host audio signal to 
hide or mask more watermark signal on the high frequency side of a loud tone. 

Non-simultaneous masking takes advantage of the phenomena that the masking effect of a 
15 sound extends beyond the time of the presentation of that sound. There is a pre-masking effect 
that tends to last only 1-2 milliseconds before the masker sound, and a post-masking effect that 
may extend anywhere from about 50 to 300 milliseconds after the masker, depending on the 
strength and duration of the masker. This enables the host audio signal to hide or mask more 
watermark signal in the temporal portion after a loud tone. 

20 

In time-frequency representation, the watermark encoder performs simultaneous and non- 
simultaneous masking analyses, either independently or in combination, to measure the 
masking capability of the signal to hide a watermark. It is worth noting that the type of 
masking depends on the nature of the watermark signal and watermark embedding function as 

25 illustrated further below. The encoder employs the frequency domain information to perform 
critical band analysis while taking into account the spreading effect. For example, the masking 
effect can be modeled with a function that has the following properties in the frequency 
dimension: a roughly triangular shaped function in the frequency dimension, where the 
masking effect has a maximum at a selected frequency (i.e. the frequency of the candidate 

30 masker sound), decreases drastically to lower frequencies and decreases more gradually to 
higher frequencies relative to the masker. 

The encoder may also model temporal masking to take into account pre and post masking 
effects. For example, the masking effect can be modeled with a function that has the following 
35 properties in the time dimension: a function that has a maximum at the time of presentation of 
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the masker, decreases drastically before the masker to model the premasking effect, and 
decreases more gradually after the masker to model the post masking effect. 

The encoder also analyzes the noise-like vs. tone-like qualities of the audio signal. When the 
5 watermark is embedded by adding a noise-like pseudorandom (PN) sequence, the encoder 
assigns higher masking capability values to noise like signals than tone like signals. When the 
watermark is embedded by adding a tonal signal, the encoder assigns a lower masking 
capability to noise. When the watermark signal is embedded by adding a shifted version of the 
host signal in the time domain (e.g., a time domain echo) or time frequency domain, the host 
10 signal inherently masks the watermark signal. However, noise segments in the host signal can 
mask the watermark signal better (with only a -A dB threshold per critical band) than tones can 
m askothertones(~-15dBperBark)ornoise(-25dBpercriticalband). In some cases, it is 
appropriate to assign a masking capability value of zero or nearly zero so that the encoder 
reduces the watermark signal to zero or nearly zero in that location of the time frequency 
1 5 representation of the host signal . 

The perceptual model also accounts for the absolute hearing threshold in determining the 
masking capability values. The absolute hearing threshold can be characterized as the amount 
of energy needed in a pure tone such that it can be detected by a listener in a noiseless 

20 environment This threshold can be approximated by a non linear-function: 

r(/) = 3.64(//1000r- 8 _ 6 .5e-°- 6 <' /1000 - 3 - 3)2 +10" 3 (/ /1000) 4 (rf5 SPL) § 
which is representative of a young listener with acute hearing. The perceptual model for 
watermarking accounts for this threshold by transforming masking control values in a manner 
that is approximately proportional to this threshold. In particular, the gain of the watermark 

25 signal is adjusted in a manner that tracks this threshold: at frequencies where hearing is more 
sensitive, the watermark signal gain is lower, and at frequencies where hearing is less sensitive, 
the gain is higher. 

For a PN based watermark signal, both the modeling function for the spreading effect and the 
30 modeling function for the temporal masking effect may be combined into a single masking 
function that models the signal in both the time and frequency dimensions of the spectrogram 
simultaneously as depicted in Fig. 3B. This modeling function is implemented as a filter 
applied to the time-frequency representation of a signal to compute an array (e.g., a time 
frequency mask) of masking control values that modulate the strength of a watermark signal, 
35 such as a spread spectrum carrier signal ( a PN sequence in the time frequency domain or 2D 
array modulated with an auxiliary message). To show both the simultaneous and non- 
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simultaneous masking attributes of the filter, the top drawing in Fig. 3B shows a three 
dimensional perspective (magnitude vs. time vs. frequency) of the filter, and the bottom 
drawings show the filter from magnitude vs. frequency and magnitude vs. time views. 

5 The filtering is implemented in stages for a PN based watermark: 1) a first stage measures the 
noise attributes throughout the time frequency representation of the signal to compute an initial 
array of gain values; 2) a second stage applies the perceptual modeling function shown in Fig. 
3B (e.g., by convolution) to modulate the gain values based on the simultaneous and non- 
simultaneous masking capabilities; and 3) a third stage adjusts the gain values to account for 
1 0 the absolute hearing threshold. 



As an alternative, the modeling function may be used to identify samples or groups of samples 
within the time frequency information that have masking capabilities suitable to hide a 
watermark In this case, the masking control values are used to determine where to apply a 

15 watermark embedding function to samples in the spectrogram. For example, the modeling 
function may identify noisy areas and/or edges in the time or frequency dimensions that are 
good masker candidates for hiding a watermark signal produced by a particular watermark 
embedding function. A vertical edge in the spectrogram (where frequency is along the vertical 
axes and time along the horizontal), for instance, provides a masking opportunity for a 

20 watermark embedded along that edge. A horizontal edge, in contrast, may be a poor candidate 
since it indicates a consistent tone over time that is less likely to hide certain types of 
watermark signals. 

While vertical edges provide masking opportunities in some cases, watermarks applied over 
25 certain types of transients in the temporal domain of an audio signal may be audible. As such, 
the watermark encoder identifies these sensitive transients and excludes or reduces the 
watermark signal around them. 

In addition to information provided from perceptual modeling, the watermark encoder also uses 
30 other criteria for determining the location and strength of the watermark signal. One criterion 
is robustness to typical transformations. For example, an audio watermark may be embedded 
so as to survive transformations due to television or radio broadcast, digital bit rate 
compression (such as MPEG audio coding like MP3 or AAC), equalization, normalization, 
digital to analog conversion, ambient room transmission, and analog to digital conversion. To 
35 make the watermark robust, the encoder may apply the watermark in frequency ranges (e.g., 
200 Hz to 5 kHz) where it is more likely to survive these types of transformations. 
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reference to Ftgure 4. MM. this descnption, a - detailed example wtll be proved 
with reference to Figure 5. 

5 A, example of time-frequency domain watemtarking is outlined in F,gure 4A. The srgnaf 400 
is divided in,obtocks,as shown in Sfep401. Next, arch bloek is converted info the tune- 
agency domain, as shown in step 403. For example, the FFT (Fas, Fourier Transform) ts 
applied «o overlapping or non-overlapping segment withtn a block. Theae segmenfs vary m 

„ length depending on the application, htthis particular implementation, the segments are about 
^liseeonds.ong. Three such segments are indicated by the lines B, ^ m 

on the frequency transformation. 

« Then, a watermark stgnal is cumpufed tat me time frequency representation aa shown in step 
404. Depending on the nature of the watermark signal, mis proeesa may incorporate perceptual 
masking analyses described above. 

b aome applications, the watennark signal is formed, a, least in part, from an auxiliary 
20 measagecompriaingasetofsymbola.suehaaabinsryorM-arysymWaequenee. Someof 

these symbols may be fixed to assist in locating tine wat«mark stgnal m a suspect stgnal (e.g., 
a fixed message start or end code or other synchronization or calibration codes). Othent may 
carry additional information such as one or mote numeric or alphanumeric messages, 
inunctions, eontrol flags, etc. To make the message signal more robust to manipulation .. 
25 mayb.repeated.emtrcorreetionencodedandspreadspectinmmodulated. Exiles of e^r 
coition ending schemes include BCH, convolution codes, furbo codes, Reed Solomon co.es, 
etc. Other fonns of symbol encoding may be used as well such as M sequences and gold 
sequences. 

30 AbinaryorM-arymessagesignalcanb.spmadspectmmmodulatedbyspreadingttovera 
' pseudorandomnumber. The pseudorandom number acts as a carrier of the message stgnah In 
ocular, a binary antipodal message signal can be spread over a pseudorandom number by 
repeating the message signal and mtntiplying it by a pseudorandom antipodal stgnal. The t«ul. 

35 modulated message stgnal can be computed by modulating a binaty message stgnal wtth a 
pseudorandom sequence using an XOR operator. 



_ PCT/DSOl/28927 

WO 02/23883 



As part of the process of computing the watermark signal (404), the encoder transforms the 
message signal into a watermark signal. It then combines the watermark signal with the host 
signal as shown in step 405. The process of combining the watermark signal may be performed 

5 in the time-frequency domain, the time domain, or some other transform domain. For example, 
the encoder may compute the watermark signal in the time frequency domain, transform it into 
the time domain, and then add the time domain watermark signal to the host signal. 
Alternatively, the encoder may embed the watermark signal into the time frequency 
representation of the host signal and transform the result into the time domain to produce the 

1 0 watermarked signal. 

The manner in which the watermark signal is combined with the host audio signal depends on 
the details of the embedding function, and any perceptual masking methods incorporated into 
the embedding process. Preferably, the encoder performs a perceptual masking analysis of the 
15 time frequency signal, and uses the result of this masking analysis to control the process of 
embedding the message signal in the host signal. 

To illustrate the embedding process in the time frequency domain, it is helpful to consider some 
examples. In one implementation, a time frequency domain perceptual mask is derived from 

20 the time frequency representation of the host audio signal by passing a filter over the time 

frequency representation of the host signal as described above. The perceptual mask comprises 
an array of gain values in the time frequency domain. The encoder generates the time 
frequency representation of the message signal by mapping the spread spectrum modulated 
message signal to sample locations in the time frequency domain. The perceptual mask is then 

25 applied to (multiplied by) corresponding binary antipodal elements in the time frequency 
representation of the message signal to form a watermark signal. 

Next, the time frequency representation of the watermark signal is converted to the time 
domain by performing an inverse transform from the time frequency domain to the time 
30 domain. 

Finally, the time domain watermark signal is added to the original host audio signal, as shown 
in step 405. The result is the watermarked signal 407. 



35 



In another implementation, the encoder embeds the watermark signal by modulating peaks in 
the time frequency representation of the host signal. The encoder first identifies peaks within a 
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10 



15 



20 



given time frequency range of a block of audio. A binary message signal is then encoded 
around the N largest peaks as follows. 

A peak sample in the time frequency domain is represented as the variable x, neighboring 
time-frequency samples at consecutive times after x in the time dimension are a and b, and 
neighboring samples at consecutively higher frequencies in the frequency dimension are c and 
d. The encoder modulates the peak so that: 

3x-b 



25 



and 



to encode a one; and 



a = b + : 



J 3x-d 
c = d + 



and 



u X ~ b 
a = b + 

4 

, x — d 
c = d + 



4 

to encode a zero. To read message, the decoder converts the watermarked signal to the time 
frequency domain, identifies the N largest peaks and computes the message values as follows. 

x-b 



a>b + : 



and 



to decode a one; and 



c>d + 



2 

x-d 



and 



■ x-b 

a<b+ 

2 

, x — d 
c<d + 



2 

to decode a zero. As a variation, the encoder may modulate additional neighboring samples 
(than just c and d) around the peak to encode a message symbol. 
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Another form of peak modulation is to identify the two top peaks in a block of the time 
frequency representation of the signal and modulate the relative heights of these two peaks. 
For example, a decrease in the relative peak differences represents a binary 0, which an 
increase in the relative peak differences represents a binary 1 . 

5 

In another implementation, the encoder embeds a message by performing echo modulation in 
the time frequency domain. In particular, the encoder segments a time frequency representation 
of a block into different frequency bands. In each of these bands, the encoder adds a low 
amplitude, time-frequency shifted version of the host signal to encode a desired symbol in the 

10 message signal. The amount and direction of the shift is a function of a secret encoding key 
that maps a desired symbol to be encoded to a particular direction and amount of shift. The 
direction of the shift varies from one band to the next to reduce the chances of false positives, 
and the shift is represented as a vector with both frequency and time components. The encoder 
may embed additional message symbols or the same message repeatedly by repeating the 

1 5 process in additional time frequency blocks of the host signal. 

To detect the echo modulation, a decoder performs auto correlation of the time frequency block 
of a watermarked signal. The message symbol is decoded based on the location of an 
autocorrelation peak in each frequency band. 

20 

One variation to this method is to encode message symbols based on the extent of the 
autocorrelation. In particular, the amount of autocorrelation in a given band or in each of a set 
of bands of the time frequency representation corresponds to a desired message symbol. 

25 In each of these methods, the encoder computes the watermark based on time frequency 
information and embeds it in the time frequency domain. In some cases, the encoder 
transforms a time-frequency watermark signal to the time domain and combines it with the host 
signal in the time domain. In others, it transforms the watermarked signal from the time 
frequency domain to the time domain. 

30 

To avoid distortion of the signal, the time-frequency transform should have an inverse. For 
example, certain types of filter banks, such as quadrature mirror filters have inverses. Wavelet 
transforms also have inverses. Time-frequency transforms based upon windowed Fourier 
transforms have an inverse computed by performing the inverse FFT on each segment and then 
35 adding the segments back together to get a time domain signal. If the segments were non- 
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overtapping, each tnverse FFT of each segment connects with the other. It the segments were 
overlapping, each invene FFT is overlapped and added appropnately. 

Additional operations may be performed to emrance dc.ectabi.tty and reduce perceptibility of 
5 ,h. watermark signal. The hos, signal samples in the time frequency domain may have 

,„ he modified as much aa samples tha, are tnconsisten, with the watermark supra!. For 
example, a horary antipodal watermark signal includes positive and negative va uea dta, add 0 
suhtiaafronrcorresponding samples of tire hos, signal. If a sample or groupofsamp.es u,the 
10 host signal corresponding fit a positive waKrmark signal is already greater man .« nerghbota, 
^enmeho^signalne^notbech^gedormaybectangedlcssroemhedmepostttve 

wamrmark signal element This same perceptual modeling .echmque applies to other forms 
watenuark signals, such as those that modulate peaks or edges of the time frequency 
representation, add echoes or modulate other statistical features of fire host signal. In general, 
,5 the gain values of the perceptual mask (or the corresponding watennark values) may be 
adjusted baaed on fire extent to which the hos. signal properties am consistent wrth the 
watermark signal properties. 

Ano.herenlnmcen^.mimprovemewa,^ „ 

of the watermarked signal. For examp,e, if the embedding process adds a modutamd notse 
signal or echo, it should do so in a manner .ha. is distinct from fire noise or echo stgnals 

25 echo by giving fire synfitefic echo properties tha. are unlikely or impossible «o occur naturally 
frequency bands). 



30 Figar e4Bshowsarela,edembeddingproce S s. Thts almmative is efficient for enrbeddmg 

waLark in a limned fiequeney range. The process is similar » fira. of Figure 4A, excep, ma, 
« includes down-sampling, as shown in s«p 452, and up-sampling, as shown in m,456. 
Even, step in Figure 4A has a similar s«ep in 4B wifit me step number shifted by 50 (t.e. 403 
453). Thus, fire discussion is focused on fire new steps 452 and 456 



35 
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The down-sampling and up-sampling allow the watermark to be computed using a portion of 
the host signal. The portion can be selected such that the watermark will be more robust and/or 
less perceptual, e.g., selecting a designated mid-range frequency band to encode the watermark 
signal. The encoder can perform pre-filtering operations, such as down/up sampling, band pass 
filtering, etc., to select a portion of the host signal for perceptual analysis and watermark 
embedding before or after the time-frequency transformation. 

The down-sampling step 452 includes application of an anti-aliasing filter. The anti-aliasing 
filter ensures that the signal has a bandwidth half of the sampling rate after the down-sampling 
step. The anti-aliasing filter may use a low-pass filter, or a band-pass filter to limit the 
watermark to a specific frequency range of the host signal. In this document, "d" represents an 
integer parameter that indicates the amount of down-sampling. For example if "d' is 4 and the 
audio signal is at a sampling rate of 44.1 kHz (which is a typical audio CD sampling rate), the 
signal is down-sampled to 1 1 .025 kHz. 

15 

The up-sampling step 456 may be implemented using a variety of methods. One method is to 
insert zeros between data points and filter with a high-order low-pass filter with the cutoff 
frequency at half the final sampling rate. It can also include first order interpolation, or, for a 
more accurate representation, it can include convolving the signal with the sine (sin(x)/x) 
20 function to create new points. 

The down-sampling and up-sampling result in a transformed and possibly degraded audio 
signal, so it is preferred to compute and add the watermark back to the original audio signal, as 
shown in step 455. 

25 

Finally, the time domain watermark signal is added back to the original audio signal 450, which 
results in a watermarked signal 457. 

Certain generally applicable features of the process shown in Figure 4B are summarized in 
30 Figure 4C. These features include computing the watermark from a transformed version of the 
host signal and adding it back to the host signal in its original domain. Note that this process is 
applicable to a variety of content types, such as images, audio and video. These basic steps are 
also reflected in Figure 5 A, which shows an example implementation of a time-frequency 
watermark encoder. 

35 
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F.gure 4D shows an example of a watermark reader compatible wrth me embedder rechnology 
a^bedabove. Reading begins wffit — audio signa, 470 into b.ocks .- *-» 
IX 47 1 Bach block ,s converted into me time-frequency domain, as shown . step 472. 
From the frequency domain the watermark is read, as shown in step 473. 

The specific details of the watennarkreadutg process depend on the en*edding In 
one Iplementan on, the watermark is computed as a percepMaHy adapted, pseudomnd m 
InoL signslwinreiemcn^ha.mcreaseordec.easecomespondmgsarnpiesmme.tme- 

LsignLuspec.edofcon.surinsawa^rk.Onewaymde^mew—.sm 
cZooJaUonbe^ecnaknownpr^ofmewa^suchasn.ep— m 

cTer signa, used to spread spectrum moduUte me message, and the suspect stgnal. If the 
""srgnaiisU.obect.rrup.ed.suchasbyms.eor frequency sca.mg or shtfrntg, a 
calibration signal may be used ,o detect i. and compensate for me corrupuon. 
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For more information about watermark embedding, detecting (including S "^ nC ^°™^^^ 
reading, see US Patent Nos. 5,862,260 and 6,122,403, and co-pending application 09/503,881, 

filed February 14, 2000) 

Figure 5A is a diagram illushaung a dme-fmquency domain watemwk embedding process for 

< 501) The first aep (indicted by block 502) divides the audio m.o segments each L seconds 
,ong. Each segmenl, therefore, has (44100 times "L") data points. 

As indicated by b!ock 503, each segment is down-sampled by an integer value f thereby 
creating a signal at (44.1 divided by »d") kHz signal. 

BU,cks 505 506 and 507 uidicate fiia, a Hamming widow of width »w" is moved along the data 
Tat m with ••»•■ points is applied to each set of • V points a S the window is moved along 
ZL. TheFFTisappliedVtimeswhcreVisonchaKofV". A FFT generates a signal 
Z inchrdes a complex connate signal. The watermark embedding frmchon shou,d - 
complex conjugate symmetry. 

reprocess depicted in blocks 505,506 and 507 reS ult in a time-^uency representation of 
the signal (similar to blocks 403 or 453) which has dimensions of V' tnnes r . 
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The length of the segment chosen, the width of the FFT, the sxze of the resulting time- 
frequency representation, and the downsizing parameter "d" are matters of engmeenng desxgn, 
and they can be chosen to meet the needs of a particular application; however, these parameters 
are related. They satisfy the following equation: 

44100 * L = r * d *n 

Next as indicated by block 5 10, the watermark data is computed in the time-frequency domain 
using a perceptually adaptive watermarking process. In one implementation, the encoder 
computes and embeds the watermark s.gnal by identifymg and then modulating peaks in the 
time-frequency domain to encode binary message symbols. Specific examples of these peak 
modulation embedding functions are described above. 

In another implementation, the encoder computes a time frequency domain watermark signal 
by adapting a binary anti-podal pseudorandom message signal to the time frequency 
representation of the host signal. In particular, the encoder generates the message s lg nal by 
spread spectrum modulating an error correction encoded message with a pseudorandom 
number. The resulting signal is anti-podal (e.g., 1 represented as a positive number, and 0 
represented as a negative number) and is mapped to sample locations in the time frequency 
representation of the host signal. The encoder adapts the message signal to the host S1 gnal by 
computing a perceptual mask as explained above. The encoder convolves a perceptual analyse 
filter over the time frequency representation to compute the perceptual mask Tins analyse 
takes into account a measure of the noise attributes and the simultaneous and non-snnultaneous 
masking attributes of the time-frequency signal to create an array of gain values and adjusts the 
gain values based on the absolute hearing threshold. It then multiplies the gain values by 
corresponding elements in the message signal to compute a perceptually adapted, tune 
frequency watermark signal. 

A further enhancement of the perceptual mask is to adjust the gain values based on whether the 
host signal sample value or values corresponding to a watermark message signal element have 
values that are consistent with the message element to be encoded. If they are already 
consistent, the gain can be reduced; otherwise the gain can be increased to increase the 
detectability of the watermark signal. 

Next as indicated by block 51 1, the watermark signal is converted to a time domain signal. If 
the watermark signal is already embedded in the time frequency representation of the host 
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si gna. * can be ce.euU.ed by taking <be differ «* T^C* 

One way h time-frequency domain and then convert the 

block 510) from the watermarked signal in the time treq y but 

Another way is to convert both the un mai 

processed. 

As b.ook 530, ft. resuhing * *- «• — <° «" «*- ■* ^ 
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a calibration signa l (also referred to as a synchronization signal) can be embedded before or 
A calibration signau embedding the message signal 

after embedding a message signal, or as part of the processor em b 

trlriginalaudio. The calibration signal is used to align the blocks between the reader 
llTddTLlowninstep 509. In one embodiment, the calibration signal comprises a set 

I ZlfrequenciesintheFouriermagnitudedomain. The calibration signal may be 
calibration signal in the time frequency domain as described above. 

T.ecalibrationsignalmay be defined in the time-frequency domain. For.example, the impulse 
TT ^^^IbesTtatlmownfrequenciesandtimesinatime-frequencyrepresentation. To 

mmafenn of these domains, such as log, or log-log samplmg). 

Figure 5B shows <he process for decoding a watermark from an audro signal. 
Optionally tewa^adecoderbeginshyde^ngmewatermarKanddetermmurgi^ 
.he scaling and orientation of the watermarked signal ate watermark embeddmg ht audro 
toe reader for decoding the embedded message, as shown m step 55 . 
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One form of a calibration signal is a signal with known peaks in the magnitude versus 
frequency (or Fourier) domain with random phase. The location of the peaks can be used to 
determine the correct sampling rate and compensate for time scaling distortion. The decoder 
can detect the calibration signal in the marked signal by correlating the marked signal with a 
5 reference calibration signal. The point of maximum correlation provides the correct block 
alignment. The decoder can perform this detection operation in the time domain using cross- 
correlation, in the frequency domain using convolution, or in some other transform domain or 
projection of the watermarked signal, such as a log or log-log re-sampling of the signal. 

10 A log or log-log resampling simplifies detection operations. For example, a log sampling of a 
watermarked signal converts scaling in the pre-sampled signal dimension to a translation or 
shift in the post-sampled dimension. This enables the decoder to use correlation methods such 
as generalized matched filters to compute the scaling distortion in the post-sampled dimension. 

15 In cases where the calibration signal is embedded in the time-frequency domain, the system 

first finds the scaling factor in the time-frequency domain. Then, after re-sampling, the system 
finds the correct alignment (i.e. offset of the blocks from the beginning of the audio signal) 
from the time-frequency domain. After it finds the correct alignment, the decoder re-aligns 
itself and starts reading the embedded message. 
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The decoder periodically checks scaling and alignment , i.e. every 1 0 seconds or so, to check 
for drift. 

In order to read an embedded message from an audio signal, the signal is divided into blocks of 
L seconds long, as shown in step 552. These segments are then transformed into the time 
frequency domain, as shown in steps 555, 556, 557. A message decoder is then be used to read 
the watermark, as shown in steps 574. The decoder operates on the remaining audio similarly, 
as shown in steps 575, 576 and 552. 

The implementation of the watermark message reader depends on the embedding function. The 
message reader is compatible with the embedding function used in the encoder and any symbol 



coding processes applied to the embedded message. If the embedding function modulates 
peaks to encode a binary message, than the reader evaluates peaks in the time-frequency 
representation to extract the message signal estimates. Examples of decoding a peak 
3 5 modulation watermark are provided above. 
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If the embedding function modulates sample values with a binary anti-podal signal as described 
previously, then the reader analyzes the time frequency values to estimate the polarity of the 
watermark signal at selected locations in the time frequency representation corresponding to 
each message signal element. The polarity provides an estimate of message signal element, 

5 which may be aggregated with other estimates to more accurately decode the embedded 

message. The reader calculates the polarity of each watermark signal element by performing 
predictive filtering on the time frequency samples to estimate the original, un-watermarked 
signal in the time frequency domain. It subtracts the estimate of the original signal, and the 
polarity of the difference signal indicates whether the watermark added or subtracted (encoded 

10 a binary 1 or 0, respectively) to the host signal in the time frequency domain. 

One form of predictive filtering is to compute for each time frequency sample expected to be 
encoded a local average of samples in a surrounding neighborhood. This local average 
provides an estimate of the original sample value, which is then subtracted to compute a 
15 difference signal. The difference signal should approximate the watermark signal. 

Note that while predictive filtering enhances decoding, it is not required. A PN based antipodal 
watermark signal can be decoded by correlating the time frequency representation of the 
watermarked signal with the PN carrier signal that was modulated with message data. 
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The decoder performs spread spectrum demodulation and error correction decoding to the 
message signal estimates to decode the embedded message. 

The remaining audio may have the same data as each other block repeated throughout the 
audio, such as a unique ID per song, or contain new data, such as the lyrics. The ID may be 
repeatedly spread over several blocks. 



Other methods of watermarking the audio data in the time frequency domain are also possible. 
One could modulate the statistical features of the waveform, such as echos or energy windows, 
30 use least significant bit replacement, or modulate waveform heights (see U.S. Appn. 
09/404,292). 

As noted above, the watermark encoder could embed a watermark using a copy of the signal 
with much lower amplitude and slightly shifted in the time-frequency domain to encode bits. 
These shifts can be thought of as low magnitude echoes with shifted frequency and/or time. 
This type of encoder embeds data by predefining one specific shift as a "1" and another specific 
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shift as a "0" . The amount of time and the angle of shift can be used to encode data bits, and 
thus transmit hidden information. Specifically, a shift of 45 degrees down and back consisting 
of 5 previous time points and 5 lower frequency points could be a "1", whereas a shift of 45 
degrees up and forward consisting of 5 future time points and 5 higher frequency points could 
5 be a "0". The data could be read using a two dimensional autocorrelation or any other existing 
method of two dimensional shift (i.e. echo) calculation. 

More specifically, the feature could be modulated differently in specific regions of the time 
frequency domain such that a room or broadcast could never simulate the feature. For example, 
10 the 5 point 45 degree shift discussed above could be used in a up and forward direction below 1 
kHz and down and back above 1 kHz to represent a " 1 ", and the inverse signal could be used to 
represent a "0". 

Finally, for synchronization of the watermark decoder, the watermark system can define a 
1 5 specific feature that represents a synchronization signal and is used to determine the beginning 
of a message or used to break a message into frames. This is in addition to or as an alternative 
to using a specific payload, such as "1 0 1 0 1 0 1 0" to represent this synchronization (a 
message symbol or set of symbols that signals the presence, start or end of a watermark signal). 
For example, echoes purely in time could be used for the message data and echoes purely in 
20 frequency could be used for synchronization. 

Also, a time domain, low amplitude PN signal could be used to determine the temporal location 
of a watermark signal as well as the time scale modifications of the watermarked audio signal 
since being encoded with the watermark. In the decoder, a watermark detector uses this PN 
25 signal to detect a watermark and to determine the shift (temporal location, or origin) and time 
scale of the watermark. In particular, it performs a correlation between the PN signal and the 
watermarked signal. The decoder uses the location and time scale that provides a maximum 
correlation to align the watermarked data before performing message decoding operations (such 
; transforming to the time frequency domain and extracting an embedded message). 
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Other watermark systems can be used to encode and decode the watermark. For example, 
watermark systems that apply to two dimensional signals, like image signals, can be applied to 
the two dimensional time-frequency representation of the audio signal to encode and decode 
watermark signals. Watermark systems described in US Patent Application No. 09/503,88 1 
35 and US Patent No. 5 ,862,260 can be applied to encode and decode watermark signals from the 
time-frequency representations of audio and video. 
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Flg . 6 shows a system for implementing the invention. An audio input source 601 provides 
audio data to a data handling program 602A in computer 600 (e.g., Personal Computer, 
Personal Digital Assistant, Phone, Set-top box, audio player, video player or other device with 
5 processing logic and memory). A FFT program 602B performs the steps shown in block 505 
of Figure 5. A perceptively adaptive watermarking program 602D performs the actions shown 
in block 5 10 in Figure 5. A Hamming windowing program 602C performs the Hamming 
windowing function of blocks 505 and 506 in Figure 5. After embedding the watermark, the 
system provides a watermarked signal output 605. For watermark decoding operations, the 
10 system may also be equipped with a watermark decoding program. 

There are a number of possible implementations and design variations to the digital 
watermarking methods described above. One implementation for synchronizing the watermark 
detector for a suspect audio signal that has undergone time shift and time scale distortion is to 
embed a series of strong echoes at intervals throughout the host audio signal as a hidden 
15 synchronization signal. This echo may be a time-shifted version of the host audio signal or a 
time-frequency shifted version of the host audio signal in a particular frequency band or bands. 
The level and frequency band of the echo signal is selected so as to be imperceptible when 
added to the host audio signal. 

In one such implementation, the digital watermark detector performs an autocorrelation 
20 of a suspect audio signal segment, and detects the synchronization signal as a peak in the 

resulting autocorrelation signal. If the peak cannot be found with a desired level of certainty 
(e.g., exceeding a threshold normalized to the suspect signal), the detector proceeds to 
upsample and/or downsample the audio segment. The re-sampled segment with the highest 
autocorrelation peak is likely to correspond to the embedding time scale. The detector then 
25 time scale transforms the suspect audio signal to the embedding time scale and extracts the 
watermark message, for example, using one or more of the techniques described above. This 
synchronization technique enables the detector to compensate for time scale errors like linear 
speed change or pitch invariant time scale modification, and also compensates for timing errors. 
Further calibration on known embedded symbols or signal sequences can be used to provide 
30 more accurate timing alignment of the embedded watermark signal. 

There are a number of ways to embed the strong echo signal described in the previous 
paragraph. The strong echo may comprise multiple echoes at different frequency bands and 
time delays. Further the sign of the autocorrelation of the audio signal may be modulated to 
further distinguish the synchronization signal from the host audio signal. 
35 The echoes may be used to encode both calibration information as well as a message signal 

comprised of variable symbols. Further, a known message sequence, possibly a pseudo random 
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sequence, can be embedded using echo modulation for further calibration of the detector for 
accurate recovery of the variable symbols. In one such implementation, the detector decodes a 
stream of potential message bits by analyzing the echo modulation of the suspect signal. It then 
correlates the stream of expected message bits with the known sequence to identify the 
5 presence of a variable digital watermark message, which follows the known sequence at a 

known time offset or resides in different frequency or time frequency location at a known offset 
relative to the location (frequency band and time) of the known sequence. Error correction 
decoding and error detection may be used to further verify the presence of a valid message 
comprised of variable symbols or a combination of variable and fixed symbols. 
10 The digital watermark embedder of an echo-based scheme can be improved by analyzing 

the host audio signal for data hiding opportunities that enhance detectability of the digital 
watermark. For example, in one implementation of an echo data hiding scheme, the embedder 
analyzes the spectral content of the host audio signal for spectrally flat areas and then encodes a 
synch echo signal to identify the start of an embedded signal. In the detector, a pre-processor 
15 performs a similar spectral analysis to identify segments of the audio signal to search for the 
synch echo signal. This enables the detector to quickly and efficiently isolate segments of the 
audio signal, where a digital watermark is likely to be recoverable. 

Preferably, the strong synch echo signal and the known message symbols used for 
calibration should be used together. The synch echo provides compensation for time scale 
20 errors, while the known message symbols provide further refinement for timing errors, 

particularly timing errors that remain after compensating for scale errors. The known symbols 
are preferably encoded at repeated time intervals so that the detector can update the timing 
information periodically. In some applications, the synch echo can be used to carry one or 
more bits of information, such as by varying the sign of the autocorrelation function, or 
25 mapping the magnitude of the autocorrelation to a particular symbol based on a predetermined 
mapping between the autocorrelation and corresponding symbols. This enables the synch echo 
to provide variable symbol information, such as detector version information, message format 
type, etc. The detector can then be programmed to execute message decoding functions, and 
interpret the resulting bits differently depending on the message carried in the synch echo. 
30 A number of techniques can be used to reduce false positives in message detection. In one 

specific implementation, the digital watermark embedder hides the same data in the 
autocorrelation signal of a 1-3 kHz band and 3-8 kHz band of a host audio signal using 
different random places to change the autocorrelation peak in each band. This reduces the 
possibility of the detector mistaking a natural echo in equipment as hidden embedded auxiliary 
35 data. 
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In another implementation, the embedder hides the same data at two spots in the 
autocorrelation signal within one band, thus reducing the likelihood of natural echoes created 
false readings. 

As a further extension, the above implementations can include changes in 
5 autocorrelation amplitude as well as adding echoes at specific times to represent binary 

symbols, 0s and Is, such as an echo at 0.5 ms for a symbol 1 and 1 ms for a symbol 0, then 0.3 
ms for a symbol 0 and 0.8 ms for a symbol 1 in the next time segment, etc. 
The autocorrelation signal can be quantized such that the autocorrelation value falls within 
. predetermined bins associated with message symbols. The embedder selects the bin for 
10 quantizing the data such that the minimum change is made to the host signal. The detector then 
evaluates the autocorrelation function of the received signal in the appropriate band and 
determines the corresponding quantization bin in which it falls. This bin indicates the message 
symbol. Further error correction decoding may be performed on the decode message symbols 
to reduce the bit error rate. 
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Concluding Remarks 



The watermarking systems described above can be used to embed auxiliary information, 
including control instructions, metadata, or links to metadata and instructions, in audio, video, 
20 or combined audio and video signals. For related information on such applications for using 

watermarks to link watermarked content to information or actions, see US Patent No. 5,841,978 
and US application nos. 09/571,422; 09/563,664; and 09/574,726. 

The methods, processes, and systems described above may be implemented in hardware, 
25 software or a combination of hardware and software. For example, the watermark encoding 

processes may be implemented in a programmable computer or a special purpose digital circuit. 
Similarly, watermark decoding may be implemented in software, firmware, hardware, or 
combinations of software, firmware and hardware. The methods and processes described 
above may be implemented in programs executed from a system's memory (a computer 
30 readable medium, such as an electronic, optical or magnetic storage device). 

While the invention has been shown and described as applied to media signals with temporal 
components like audio and video signals, a process of down-sampling to facilitate the 
application of a relatively small and efficient transform could be applied to other types of media 
35 signals such as still images, graphics, etc. 
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To provide a comprehensive disclosure without unduly lengthening the specification, applicants 
incorporate by reference the patents and patent applications referenced above. The particular 
combinations of elements and features in the above-detailed embodiments are exemplary only ; 
the interchanging and substitution of these teachings with other teachings in this and the 
incorporated-by-reference patents/applications are also contemplated. 

While the invention has been shown and described with respect to preferred embodiments 
thereof, it should be understood that various changes in forma and detail can be made without 
departing from the spirit and scope of the invention. 
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1 . A method of watermarking a media signal with a temporal component, the method 
comprising: 

5 dividing at least a portion of the media signal into segments, 

processing each segment with the following actions, 

moving a window along the media signal in the segment and repeatedly applying a 
frequency transform to the media signal in each window to generate a time-frequency 
representation, 

10 computing a perceptually adaptive watermark in the time-frequency domain, 

converting the watermark signal to the time domain using an inverse frequency 
transform and 

repeating the process until each segment has been processed, 

adding the watermark signal to the media signal to generate a watermarked media 

15 signal. 

2. The method of claim 1 comprising: 

dividing said media signal into blocks, where each block is divided into segments. 
2 0 3 . The method of claim 1 wherein the window is a hamming window. 

4. The method of claim 1 comprising: 

down-sampling the segments before computing the time frequency representation; 
upsampling the converted watermark signal; wherein adding the media signal comprises adding 
25 the upsampled watermark signal to the media signal. 

5. The method of claim 1 wherein said frequency transform comprises a Fourier 
transform. 

30 6. The method of claim 1 wherein the media signal comprises an audio signal. 

7. A computer readable medium having software for performing the method of claim 

1. 



-24- 



WO 02/23883 




PCT/liSOl/28927 



8- A method of watermarking a media signal with a temporal component, the method 

comprising: 

dividing the media signal into segments, 
transforming each segment into a time-frequency spectrogram, 
5 computing a time-frequency domain watermark signal based on the time frequency 

spectrogram, 

combining the time-frequency domain watermark signal with the media signal to 
produce a watermarked media signal. 

10 9. The method of claim 8 wherein the time frequency domain watermark signal is 

computed based on a perceptual analysis of the time-frequency spectrogram. 

10. A computer readable medium having software for performing the method of claim 

8. 

15 

1 1 . A system for watermarking a media signal comprising: 
means for dividing a media signal into segments; 

means for transforming the segments into a time frequency representation; 
means for computing a watermark from the time frequency representation of the media 
20 signal and an auxiliary message to be encoded in the watermark; and means for combining the 
watermark with the media signal to create a watermarked signal. 

12. The system of claim 10 wherein the media signal comprises an audio signal. 

25 13. A method of decoding a watermark from a media signal comprising: 

transforming the media signal to a time frequency representation; 
computing elements of a message signal embedded into the media signal from the time 
frequency representation; and 

decoding a message from the elements. 

30 

14. The method of claim 13 including: spread spectrum demodulating the message 
from the message elements. 

15. A computer readable medium having software for performing the method of claim 

35 13. 
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16. A watermark decoder comprising: 

a detector for determining whether a watermark is present in the media signal and 
determining an alignment and scale of the watermark; and 

a reader for decoding an auxiliary message embedded in a time frequency 
5 representation of the media signal. 

17. A method of time-frequency domain watermarking of a media signal comprising: 
converting the media signal into a time-frequency domain representation, 
embedding a watermark in the time frequency domain representation to produce a 

10 watermarked signal in the time frequency domain, and 

converting the watermarked signal from the time-frequency domain into a domain in 
which the signal is perceived by viewers or listeners. 



1 8. The method of claim 1 7 wherein the embedding includes embedding using a 

15 perceptual model that identifies noisy areas in the time frequency representation and hides the 
watermark data around the noisy areas in the time frequency representation. 

19. The method of claim 17 wherein the conversion to the time-frequency domain 
includes dividing the signal into blocks and separately transforming the blocks into the time- 

20 frequency domain. 

20. The method of claim 1 9 wherein a watermark message comprising two or more 
message symbols is embedded into each block. 

25 21 . The method of claim 1 7 wherein a calibration signal is embedded in the media 

signal to assist a watermark decoder in determining scaling and alignment of the watermark in a 
potentially distorted version of the watermarked media signal. 

22. The method of claim 21 wherein the calibration signal includes a signal with 
30 distinct impulse functions in the magnitude of the Fourier domain. 

23. The method of claim 17 wherein the media signal is an audio signal, the audio 
signal is down-sampled before the watermark is embedded, the watermark is isolated as a time 
domain signal, and the isolated, time domain watermark is added to the original media signal in 

35 the time domain. 
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24. The method of claim 23 wherein the isolation of the watermark includes 
subtracting an unmarked version of the media signal from the watermarked version of the 
media signal in the time-frequency domain and subsequently converting the time-frequency 
representation of the watermark back into the time domain. 

5 

25 . The method of claim 23 wherein the isolation of the watermark includes 
computing a time-frequency domain watermark signal from the time-frequency representation 
of the media signal . 

10 26. A method of watermarking a media signal comprising: 

transforming the media signal from a perceptual domain to a transform domain, 
computing a watermark signal from the transformed signal, 
isolating the watermark signal, and 

adding the watermark to the media signal in the perceptual domain. 

15 

27. The method of claim 26 wherein transforming includes down-sampling the media 
signal, and wherein isolating the watermark signal includes up-sampling the watermark signal 
before adding the upsampled watermark signal to the media signal. 

20 28. The method of claim 27 wherein transforming includes separating the media signal 

into blocks and converting each block into the time-frequency domain, and performing an 
inverse time-frequency transform when isolating the watermark signal. 

29. The method of claim 28 wherein the time-frequency transform includes taking fast 
25 Fourier transforms of windowed segments of each block and the isolation of the watermark 

includes taking inverse Fourier transforms of each segment and adding the segments together in 
the perceptual domain. 

30. A method of decoding a watermark from a media signal comprising: 

30 transforming the media signal into the time-frequency domain and reading the 

watermark from the transformed media signal. 

3 1. The method of claim 30 wherein transforming includes separating the signal into 

blocks. 

35 
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32. The method of claim 30 wherein transforming includes using a Fourier transform 
on windowed segments within each block. 

33. The method of claim 30 including: 

5 using a calibration signal to determine the correct scaling and alignment before 

converting to the time-frequency domain. 

34. The method of claim 33 wherein the calibration signal comprises a set of impulse 
functions in a Fourier domain. 

0 

35. A computer readable medium having software for performing the method of claim 

30. 
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Figure 4A 
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Figure 5B 
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Figure 6 
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